Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[@inline] annotation behaves incorrectly on 4.08 (flambda only) #8563

Closed
smuenzel-js opened this issue Apr 1, 2019 · 28 comments · Fixed by #11383
Closed

[@inline] annotation behaves incorrectly on 4.08 (flambda only) #8563

smuenzel-js opened this issue Apr 1, 2019 · 28 comments · Fixed by #11383
Assignees

Comments

@smuenzel-js
Copy link
Contributor

The following program, when compiled with flambda on 4.08 (and trunk), does not respect the [@inline never] annotation. Cmm is below.
When raise_stuff is exported by the signature, the [@inline never] annotation works.

This only happens when the raise_stuff functions is used exactly once.

module X : sig
  val run_function : int -> int 
end = struct
  let [@inline never] raise_stuff () =
    raise Exit

  let run_function x =
    if x <> 0
    then raise_stuff ()
    else x
end

with 4.07:

(function{/tmp/inline_stuff.ml:6,34-53} camlInline_stuff__raise_stuff_6
     (param/1448: val)
 (raise_notrace{/tmp/inline_stuff.ml:7,4-14} "camlStdlib__Pmakeblock_2739"))

(function{/tmp/inline_stuff.ml:9,19-71} camlInline_stuff__run_function_17
     (x/1451: val)
 (if (!= x/1451 1)
   (app{/tmp/inline_stuff.ml:11,9-23} "camlInline_stuff__raise_stuff_6" 1a
     val)
   x/1451))

with 4.08:

(function{/tmp/inline_stuff.ml:9,19-71} camlInline_stuff__run_function_6
     (x/91: val)
 (if (!= x/91 1)
   (raise_notrace{/tmp/inline_stuff.ml:7,4-14} "camlStdlib__Pmakeblock_1820")
   x/91))

@smuenzel-js
Copy link
Contributor Author

cc @mshinwell

@xclerc
Copy link
Contributor

xclerc commented Apr 1, 2019

@alainfrisch

This is caused by the Simplif.simplify_local_functions optimization.
It does not happen with clambda because the lambdas are different:

; clambda
(seq
  (seq
    (let
      (raise_stuff/81 =
         (function param/82 never_inline (raise (field 2 (global Stdlib!)))))
      (setfield_ptr(root-init) 1 (global J!) raise_stuff/81))
    (let
      (run_function/83 =
         (function x/84[int] : int
           (if (!= x/84 0) (apply (field 1 (global J!)) 0a) x/84)))
      (setfield_ptr(root-init) 2 (global J!) run_function/83))
    0a)
  (let (X/80 = (makeblock 0 (field 2 (global J!))))
    (seq (setfield_ptr(root-init) 0 (global J!) X/80) 0a)))

vs

; flambda
(let
  (X/80 =
     (module-defn(X/80) j.ml(1):0-191
       (let
         (raise_stuff/81 =
            (function param/82 never_inline
              (raise (field 2 (global Stdlib!))))
          run_function/83 =
            (function x/84[int] : int
              (if (!= x/84 0) (apply raise_stuff/81 0a) x/84)))
         (makeblock 0 run_function/83))))
  (makeblock 0 X/80))

Thus making the function local under flambda, but non-local under
clambda. The following diff fixes the issue:

diff --git a/bytecomp/simplif.ml b/bytecomp/simplif.ml
index d57171e8b..fcde28c97 100644
--- a/bytecomp/simplif.ml
+++ b/bytecomp/simplif.ml
@@ -755,8 +755,9 @@ let simplify_local_functions lam =
   in
   let enabled = function
     | {local = Always_local; _}
-    | {local = Default_local; inline = (Never_inline | Default_inline); _}
+    | {local = Default_local; inline = (Default_inline); _}
       -> true
+    | {local = Default_local; inline = (Never_inline); _}
     | {local = Default_local; inline = (Always_inline | Unroll _); _}
     | {local = Never_local; _}
       -> false

However, there is probably a reason why the optimization was
enabled for functions marked Nerver_inline.

@xclerc
Copy link
Contributor

xclerc commented Apr 1, 2019

After re-reading #2143, it occurs to me that it might be an intended change.
In which case, the posted code should add another attribute to disable the
optimization; that is:

module X : sig
  val run_function : int -> int 
end = struct
  let [@local never] [@inline never] raise_stuff () =
    raise Exit

  let run_function x =
    if x <> 0
    then raise_stuff ()
    else x
end

@alainfrisch
Copy link
Contributor

The discussion on #2143 suggested that "local" and "inline" should be considered as independent. inline always implicitly disables the new optimization if not explicitly forced (otherwise it would inhibit actual inlining), but inline never doesn't, because the new optimization is not considered to be a form of inlining. One could use inline never to express the intention that the function body should never be duplicated, and the new optimization complies with this constraint.

So, yes, I'd say the current behavior is fine, and if one wants to disable both the inlining and the "local" optimization, one should use both attributes.

That said, if someone wants to propose that inline never should also disable the new optimization, I'm not strongly opposed to it.

@mshinwell
Copy link
Contributor

mshinwell commented Apr 1, 2019

I looked at #2169 to refresh my memory about what functions this transformation applied to, and didn't really find out the answer. I think it might be beneficial to improve the documentation.

I think the current behaviour is unfortunate given a fair amount of code exists that assumes "inline never" disallows substitution of a function body (whether taking into account contextual information or not). I wouldn't be surprised if this is going to cause performance regressions. It also seems overly verbose to have to write two separate attributes to produce a straightforward behaviour.

I'm not quite convinced whether making "inline never" disable this optimisation is the right thing to do, although it seems like it might be a way out of this; I think something should be done. @chambart Do you have an opinion on this? @lpw25 is away for some time.

@smuenzel-js
Copy link
Contributor Author

I would like to propose that :)

Obviously I was surprised by the behaviour, but that alone shouldn't justify it.

The main use of [@inline never] that I've seen in the wild is the optimization that I added in my code snippet: you don't want to include a large chunk of code that constructs an infrequently used exception in the main body of your code. (by large, I mean the usual code section used to construct a human-readable exception).
In performance sensitive-applications, this additional unused code included in the function has a noticeable impact.

Now, we could add [@inline never] [@local never] everywhere, but that seems a bit heavyweight.

I wasn't familiar with the definition of inlining used in #2143, my working definition was: "Inlining is the pasting of a function's body at the place of usage"

@alainfrisch
Copy link
Contributor

I was initially in favor of reusing the inline attribute, because I agreed with defining inlining as "pasting of a function's body at the place of usage", but @lpw25 argued that the new transformation "is of a fundamentally different nature" (and thus there is no reason inline never would disable it). I'm tempted to let our inlining experts decide what's best to do, but if they consider that the new transformation is not a form of inlining, and there is a use case for disabling both this one and inlinine, the solution should be to add a new attribute that express the exact intention of not allowing the code for the function's body to be moved anywhere else.

@mshinwell
Copy link
Contributor

@alainfrisch I was wondering about adding such an attribute. However I'm not really sure what a sensible migration path would be.

@gasche
Copy link
Member

gasche commented Apr 1, 2019

Some extra points because the discussion was too simple:

  • Given the history of automatic-patching tools in the Janestreet codebase, it is probably possible for you to automatically rewrite all [@inline never] into [@inline never][@local never] if you wish to. (This can remove some optimization opportunities, but is is also safe.)

  • It's not completely clear to me what the best design, starting from scratch, would be. Complete independence of the two options could be nice, if we could trust that the local-function optimization preserves the [@inline always] optimization even when it fires, and that inliners down the road correctly inline. (But this doesn't work today because the inliner only looks at function definitions, not at staticraise sites, if I understand correctly.) In an ideal world that would be my favorite from-scratch design.

  • All things being equal, we should be in favor of not breaking user's code, unless there is an easy fix. (See the point on automated code rewrite for a potential easy fix.)

  • This is not specific to the new [@local] optimization: as we improve our optimization capabilities, we are going to find more and more cases where new releases pessimizes code. (I would be surprised if all the cool stuff in flambda2 didn't suffer from the same potential pain point, even though the people working on it seem to have a reasonable intuition of how "predictable" various optimization changes can be.)

  • I would be curious to see performance numbers on the situations that are affected by this change. (Also, would a different treatment of staticraise in the linearization pass improve numbers here?)

@xavierleroy
Copy link
Contributor

My gut feeling is that the new Lambda simplification is function inlining: inling of sorts, inlining broadly speaking, but inlining nonetheless. Likewise, I would expect that [@inline never] turns the simplification off.

@xavierleroy xavierleroy added this to the 4.08 milestone Apr 1, 2019
@alainfrisch
Copy link
Contributor

So shouldn't we revisit the decision to have a dedicated attribute, instead of supporting [@inline local] (or whatever) to force the new optimization?

@gasche
Copy link
Member

gasche commented Apr 1, 2019

So:

  • [@inline always] performs duplication of the function instead of the non-duplicating localization optimization we are discussing
  • [@inline local] corresponds to the behavior used by default when the optimization applies (but it is more explicit about the fact that the user cares about the optimization), and is an error when the optimization does not apply (in particular, annoting a local function that escapes with [@inline local] is an error, it does not mean "localize only local uses")
  • [@inline never] disables all optimizations

That works for me and solves the problem in this PR.

@alainfrisch
Copy link
Contributor

This is close to the original design in #2143 (with the [@inline static] terminology). I think that the only difference is that [@inline always] did not disallow the "localization" optimization if it was applicable, but I can see that people might want to force the duplicating (+specialization) behavior instead of the localization. I'm happy to implement that if there is a consensus, but it might be slightly bad style to do that without giving @lpw25 a chance to argue.

@lpw25
Copy link
Contributor

lpw25 commented Apr 1, 2019

I'm not sure that conflating [@local] with [@inline] is the right thing to do here. There are a number of aspects to inlining and only some of them apply here. Inlining is primarily about specialising the body of a function to its arguments at a specific call site, and potentially duplicating the body to do so.

Here we are instead detecting that one of the function's parameters -- its return continuation -- is always the same and specialising the body of the function to the value of that parameter without duplicating the body. It also allows the body to be specialised to the other parameters if any of them are always equal, but only if the continuation parameter is always equal. In this sense it is a somewhat strange optimisation -- it would probably be cleaner in the long run to allow any of the parameters to be specialised regardless of whether the continuation parameter can be.

One thing that is not clear to me from the discussion so far is, where was staticcatch itself inlined? Is this just done by linearize, or is there some other optimisation earlier that inlines them? Arguably, it is this step which should be prevented by [@never inline]. That would probably give good behaviour for the example given -- if the staticcatch were still there then presumably it would be given its own label at the end of the function and so kept out of the hot path of the code.

To a certain extent, the problem here is that the annotation [@inline never] is not clear enough -- it is an instruction about how a specific optimisation should behave, not a description of the property that the programmer desires to hold. In this particular case something like [@cold] would be more suitable -- indicating that the code is on the cold path and should be kept out of the way to avoid polluting the instruction cache on the hot path. With something like that the compiler could make informed decisions about which optimisations to apply -- but as it is we can't really because there is more than one reason to use [@inline never] and they should be handled differently by other optimisations.

@gasche
Copy link
Member

gasche commented Apr 1, 2019

(I had written a message with the exact same suggestion of [@cold] as Leo, and I second the suggestion.)

@lpw25
Copy link
Contributor

lpw25 commented Apr 1, 2019

Another relevant issue here: if the code of raise_stuff were recursive then it would probably also require [@specialise never] in order to maintain the desired property. So we should look at how that annotation relates to the others as well.

@alainfrisch
Copy link
Contributor

[@cold] looks like a good idea, but probably not for 4.08.

In the short term, I can see three directions:

  • [@inline never] implicitly forces [@local never]

  • We go back to the initial idea of using the @inline attribute to control the new optimization (with a new "local" payload).

  • Do nothing for now (and let users that really need the "cold" semantics add [@local never] where needed).

I'm fine with any of these solutions.

@lpw25
Copy link
Contributor

lpw25 commented Apr 6, 2019

The first and third of those options are probably fine with me, but I would appreciate if someone could answer my question:

One thing that is not clear to me from the discussion so far is, where was staticcatch itself inlined? Is this just done by linearize, or is there some other optimisation earlier that inlines them?

before making a decision (I'll be back from vacation next week and can check myself then if people are happy to wait).

A simple version of [@cold] to mean [@inline never][@specialize never][@local never] can be done as a ppx, and I think that would be fine for the uses of [@inline never] at Jane St. So we're probably fine with the "do nothing" solution.

@trefis
Copy link
Contributor

trefis commented Apr 17, 2019

A simple version of [@cold] to mean [@inline never][@specialize never][@local never] can be done as a ppx, and I think that would be fine for the uses of [@inline never] at Jane St. So we're probably fine with the "do nothing" solution.

Confirmed. Since this issue seems to have stalled, we (jane street) are in the process of writing that ppx.

@alainfrisch
Copy link
Contributor

One thing that is not clear to me from the discussion so far is, where was staticcatch itself inlined? Is this just done by linearize, or is there some other optimisation earlier that inlines them?

In the example given here, inlining is done at the Lambda level (Simplif.simlify_exits -- "(* Inline handler if there is a single occurrence and it is not nested within an inner try..with *)).

@alainfrisch alainfrisch removed this from the 4.08 milestone Apr 17, 2019
@alainfrisch
Copy link
Contributor

Considering the latest notes, I've dropped the consider-for-release label and the 4.08 milestone. I'm keeping the issue open for now, as a reminder about the idea of the cold annotation.

@lpw25
Copy link
Contributor

lpw25 commented Apr 17, 2019

In the example given here, inlining is done at the Lambda level

That suggests another thing we could do here: adding an inline_attribute to Lstaticcatch and having Simplif.simplify_exits respect it. Then the local transformation would simply copy the inline_attribute from the function to the Lstaticcatch.

@github-actions
Copy link

github-actions bot commented May 6, 2020

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 6, 2020
@lpw25
Copy link
Contributor

lpw25 commented May 6, 2020

I think that we are still not in an ideal state for this issue. Two more things that should probably be done:

  1. Add the [@cold] annotation to the compiler. We've been using a simple ppx to implement this at Jane St. but it would be better if the compiler supported it. It is superior to the [@inline never][@local never][@specialize never] that it represents because it allows the programmer to specify their actual intent.
  2. Add an inlining_attribute to Lstaticcatch, have simplify_exits respect it, and have the local optimisation fill it from the original function. This should make [@inline never] behave as expected with the local optimisation. It will also be useful for flambda 2.0, which performs transformations along the same lines as simplify_exits.

@lpw25 lpw25 self-assigned this May 6, 2020
@github-actions github-actions bot closed this as completed Jun 8, 2020
@smuenzel-js
Copy link
Contributor Author

can we reopen this issue?

@nojb nojb reopened this Jun 8, 2020
@nojb nojb removed the Stale label Jun 8, 2020
@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions
Copy link

github-actions bot commented Jul 1, 2022

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Jul 1, 2022
@lthls
Copy link
Contributor

lthls commented Jul 1, 2022

On the flambda-backend repo, I've restricted the [@local] optimisation to only apply if the function definition and the call sites are in the same function (it's part of ocaml-flambda/flambda-backend#537). If that is seen as a suitable fix, I can make a PR here.

@github-actions github-actions bot removed the Stale label Jul 4, 2022
emillon added a commit to emillon/dune that referenced this issue Jul 19, 2022
Closes ocaml#2424

The `[@@Cold]` annotation mentioned in the issue is not part of the
compiler, but is implemented in `ppx_cold`. The effect of the patch is
that it reorders some functions in the binary.
See discussion in ocaml/ocaml#8563.

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to ocaml/dune that referenced this issue Jul 28, 2022
Closes #2424

The `[@@Cold]` annotation mentioned in the issue is not part of the
compiler, but is implemented in `ppx_cold`. The effect of the patch is
that it reorders some functions in the binary.
See discussion in ocaml/ocaml#8563.

Signed-off-by: Etienne Millon <me@emillon.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants